AITopics | ir image

Collaborating Authors

ir image

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CLEAR-IR: Clarity-Enhanced Active Reconstruction of Infrared Imagery

Shankar, Nathan, Ladosz, Pawel, Yin, Hujun

arXiv.org Artificial IntelligenceOct-7-2025

Abstract--This paper presents a novel approach for enabling robust robotic perception in dark environments using infrared (IR) stream. IR stream is less susceptible to noise than RGB in low-light conditions. However, it is dominated by active emitter patterns that hinder high-level tasks such as object detection, tracking and localisation. T o address this, a U-Net-based architecture is proposed that reconstructs clean IR images from emitter-populated input, improving both image quality and downstream robotic performance. This approach outperforms existing enhancement techniques and enables reliable operation of vision-driven robotic systems across illumination conditions from well-lit to extreme low-light scenes. Lighting-invariant vision systems are desirable for enabling robots to operate robustly across diverse and unpredictable environments without requiring modifications to the underlying perception pipeline. In order to support high-level tasks such as object detection, semantic segmentation, and image classification, the vision system must remain reliable even in low light or completely dark scenes. Such capabilities are critical in domains like mine shaft exploration, post-disaster victim identification, nuclear facility inspection, and visual loop closure in feature-deprived environments using aruco markers.

artificial intelligence, emitter pattern, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.04883

Country: Europe > United Kingdom (0.14)

Genre: Research Report (0.70)

Industry: Government (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

MSCrackMamba: Leveraging Vision Mamba for Crack Detection in Fused Multispectral Imagery

Zhu, Qinfeng, Fang, Yuan, Fan, Lei

arXiv.org Artificial IntelligenceDec-9-2024

Crack detection is a critical task in structural health monitoring, aimed at assessing the structural integrity of bridges, buildings, and roads to prevent potential failures. Vision-based crack detection has become the mainstream approach due to its ease of implementation and effectiveness. Fusing infrared (IR) channels with red, green and blue (RGB) channels can enhance feature representation and thus improve crack detection. However, IR and RGB channels often differ in resolution. To align them, higher-resolution RGB images typically need to be downsampled to match the IR image resolution, which leads to the loss of fine details. Moreover, crack detection performance is restricted by the limited receptive fields and high computational complexity of traditional image segmentation networks. Inspired by the recently proposed Mamba neural architecture, this study introduces a two-stage paradigm called MSCrackMamba, which leverages Vision Mamba along with a super-resolution network to address these challenges. Specifically, to align IR and RGB channels, we first apply super-resolution to IR channels to match the resolution of RGB channels for data fusion. Vision Mamba is then adopted as the backbone network, while UperNet is employed as the decoder for crack detection. Our approach is validated on the large-scale Crack Detection dataset Crack900, demonstrating an improvement of 3.55% in mIoU compared to the best-performing baseline methods.

artificial intelligence, machine learning, resolution, (16 more...)

arXiv.org Artificial Intelligence

2412.06211

Country:

Asia > China > Shaanxi Province > Xi'an (0.05)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generative Model-Based Fusion for Improved Few-Shot Semantic Segmentation of Infrared Images

Yun, Junno, Akçakaya, Mehmet

arXiv.org Artificial IntelligenceDec-6-2024

Infrared (IR) imaging is commonly used in various scenarios, including autonomous driving, fire safety and defense applications. Thus, semantic segmentation of such images is of great interest. However, this task faces several challenges, including data scarcity, differing contrast and input channel number compared to natural images, and emergence of classes not represented in databases in certain scenarios, such as defense applications. Few-shot segmentation (FSS) provides a framework to overcome these issues by segmenting query images using a few labeled support samples. However, existing FSS models for IR images require paired visible RGB images, which is a major limitation since acquiring such paired data is difficult or impossible in some applications. In this work, we develop new strategies for FSS of IR images by using generative modeling and fusion techniques. To this end, we propose to synthesize auxiliary data to provide additional channel information to complement the limited contrast in the IR images, as well as IR data synthesis for data augmentation. Here, the former helps the FSS model to better capture the relationship between the support and query sets, while the latter addresses the issue of data scarcity. Finally, to further improve the former aspect, we propose a novel fusion ensemble module for integrating the two different modalities. Our methods are evaluated on different IR datasets, and improve upon the state-of-the-art (SOTA) FSS models.

machine learning, natural language, segmentation, (17 more...)

arXiv.org Artificial Intelligence

2412.05341

Country: North America > United States > Minnesota (0.04)

Genre: Research Report (0.40)

Industry:

Government > Military (0.46)
Law Enforcement & Public Safety > Fire & Emergency Services (0.34)
Information Technology > Robotics & Automation (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(3 more...)

Add feedback

IRisPath: Enhancing Off-Road Navigation with Robust IR-RGB Fusion for Improved Day and Night Traversability

Sharma, Saksham, Raizada, Akshit, Sundaram, Suresh

arXiv.org Artificial IntelligenceDec-4-2024

Autonomous off-road navigation is required for applications in agriculture, construction, search and rescue and defence. Traditional on-road autonomous methods struggle with dynamic terrains, leading to poor vehicle control on off-road. Recent deep-learning models have used perception sensors along with kinesthetic feedback for navigation on such terrains. However, this approach has out-of-domain uncertainty. Factors like change in weather and time of day impacts the performance of the model. We propose a multi modal fusion network FuseIsPath capable of using LWIR and RGB images to provide robustness against dynamic weather and light conditions. To aid further works in this domain, we also open-source a day-night dataset with LWIR and RGB images along with pseudo-labels for traversability. In order to co-register the two images we developed a novel method for targetless extrinsic calibration of LWIR, LiDAR and RGB cameras with translation accuracy of 1.7cm and rotation accuracy of 0.827degree.

calibration, dataset, vehicle, (16 more...)

arXiv.org Artificial Intelligence

2412.03173

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.98)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
(2 more...)

Add feedback

GAPartManip: A Large-scale Part-centric Dataset for Material-Agnostic Articulated Object Manipulation

Cui, Wenbo, Zhao, Chengyang, Wei, Songlin, Zhang, Jiazhao, Geng, Haoran, Chen, Yaran, Wang, He

arXiv.org Artificial IntelligenceNov-27-2024

Effectively manipulating articulated objects in household scenarios is a crucial step toward achieving general embodied artificial intelligence. Mainstream research in 3D vision has primarily focused on manipulation through depth perception and pose detection. However, in real-world environments, these methods often face challenges due to imperfect depth perception, such as with transparent lids and reflective handles. Moreover, they generally lack the diversity in part-based interactions required for flexible and adaptable manipulation. To address these challenges, we introduced a large-scale part-centric dataset for articulated object manipulation that features both photo-realistic material randomizations and detailed annotations of part-oriented, scene-level actionable interaction poses. We evaluated the effectiveness of our dataset by integrating it with several state-of-the-art methods for depth estimation and interaction pose prediction. Additionally, we proposed a novel modular framework that delivers superior and robust performance for generalizable articulated object manipulation. Our extensive experiments demonstrate that our dataset significantly improves the performance of depth perception and actionable interaction pose prediction in both simulation and real-world scenarios.

dataset, gapartmanip, manipulation, (17 more...)

arXiv.org Artificial Intelligence

2411.18276

Country:

North America > United States > California (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.49)

Add feedback

OCTCube: A 3D foundation model for optical coherence tomography that improves cross-dataset, cross-disease, cross-device and cross-modality analysis

Liu, Zixuan, Xu, Hanwen, Woicik, Addie, Shapiro, Linda G., Blazes, Marian, Wu, Yue, Lee, Cecilia S., Lee, Aaron Y., Wang, Sheng

arXiv.org Artificial IntelligenceAug-20-2024

Optical coherence tomography (OCT) has become critical for diagnosing retinal diseases as it enables 3D images of the retina and optic nerve. OCT acquisition is fast, non-invasive, affordable, and scalable. Due to its broad applicability, massive numbers of OCT images have been accumulated in routine exams, making it possible to train large-scale foundation models that can generalize to various diagnostic tasks using OCT images. Nevertheless, existing foundation models for OCT only consider 2D image slices, overlooking the rich 3D structure. Here, we present OCTCube, a 3D foundation model pre-trained on 26,605 3D OCT volumes encompassing 1.62 million 2D OCT images. OCTCube is developed based on 3D masked autoencoders and exploits FlashAttention to reduce the larger GPU memory usage caused by modeling 3D volumes. OCTCube outperforms 2D models when predicting 8 retinal diseases in both inductive and cross-dataset settings, indicating that utilizing the 3D structure in the model instead of 2D data results in significant improvement. OCTCube further shows superior performance on cross-device prediction and when predicting systemic diseases, such as diabetes and hypertension, further demonstrating its strong generalizability. Finally, we propose a contrastive-self-supervised-learning-based OCT-IR pre-training framework (COIP) for cross-modality analysis on OCT and infrared retinal (IR) images, where the OCT volumes are embedded using OCTCube. We demonstrate that COIP enables accurate alignment between OCT and IR en face images. Collectively, OCTCube, a 3D OCT foundation model, demonstrates significantly better performance against 2D models on 27 out of 29 tasks and comparable performance on the other two tasks, paving the way for AI-based retinal disease diagnosis.

octcube, prediction, retfound, (15 more...)

arXiv.org Artificial Intelligence

2408.11227

Country:

North America > United States > Iowa (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report > Experimental Study (0.95)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.88)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ASGrasp: Generalizable Transparent Object Reconstruction and Grasping from RGB-D Active Stereo Camera

Shi, Jun, A, Yong, Jin, Yixiang, Li, Dingzhe, Niu, Haoyu, Jin, Zhezhu, Wang, He

arXiv.org Artificial IntelligenceMay-9-2024

In this paper, we tackle the problem of grasping transparent and specular objects. This issue holds importance, yet it remains unsolved within the field of robotics due to failure of recover their accurate geometry by depth cameras. For the first time, we propose ASGrasp, a 6-DoF grasp detection network that uses an RGB-D active stereo camera. ASGrasp utilizes a two-layer learning-based stereo network for the purpose of transparent object reconstruction, enabling material-agnostic object grasping in cluttered environments. In contrast to existing RGB-D based grasp detection methods, which heavily depend on depth restoration networks and the quality of depth maps generated by depth cameras, our system distinguishes itself by its ability to directly utilize raw IR and RGB images for transparent object geometry reconstruction. We create an extensive synthetic dataset through domain randomization, which is based on GraspNet-1Billion. Our experiments demonstrate that ASGrasp can achieve over 90% success rate for generalizable transparent object grasping in both simulation and the real via seamless sim-to-real transfer. Our method significantly outperforms SOTA networks and even surpasses the performance upper bound set by perfect visible point cloud inputs.Project page: https://pku-epic.github.io/ASGrasp

cloud, dataset, point cloud, (16 more...)

arXiv.org Artificial Intelligence

2405.05648

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Thermal-NeRF: Neural Radiance Fields from an Infrared Camera

Ye, Tianxiang, Wu, Qi, Deng, Junyuan, Liu, Guoqing, Liu, Liu, Xia, Songpengcheng, Pang, Liang, Yu, Wenxian, Pei, Ling

arXiv.org Artificial IntelligenceMar-15-2024

In recent years, Neural Radiance Fields (NeRFs) have demonstrated significant potential in encoding highly-detailed 3D geometry and environmental appearance, positioning themselves as a promising alternative to traditional explicit representation for 3D scene reconstruction. However, the predominant reliance on RGB imaging presupposes ideal lighting conditions: a premise frequently unmet in robotic applications plagued by poor lighting or visual obstructions. This limitation overlooks the capabilities of infrared (IR) cameras, which excel in low-light detection and present a robust alternative under such adverse scenarios. To tackle these issues, we introduce Thermal-NeRF, the first method that estimates a volumetric scene representation in the form of a NeRF solely from IR imaging. By leveraging a thermal mapping and structural thermal constraint derived from the thermal characteristics of IR imaging, our method showcasing unparalleled proficiency in recovering NeRFs in visually degraded scenes where RGB-based methods fall short. We conduct extensive experiments to demonstrate that Thermal-NeRF can achieve superior quality compared to existing methods. Furthermore, we contribute a dataset for IR-based NeRF applications, paving the way for future research in IR NeRF reconstruction.

reconstruction, representation, thermal-nerf, (13 more...)

arXiv.org Artificial Intelligence

2403.1034

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Simulating Nighttime Visible Satellite Imagery of Tropical Cyclones Using Conditional Generative Adversarial Networks

Yao, Jinghuai, Du, Puyuan, Zhao, Yucheng, Wang, Yubo

arXiv.org Artificial IntelligenceJan-21-2024

Visible (VIS) imagery of satellites has various important applications in meteorology, including monitoring Tropical Cyclones (TCs). However, it is unavailable at night because of the lack of sunlight. This study presents a Conditional Generative Adversarial Networks (CGAN) model that generates highly accurate nighttime visible reflectance using infrared (IR) bands and sunlight direction parameters as input. The model was trained and validated using target area observations of the Advanced Himawari Imager (AHI) in the daytime. This study also presents the first nighttime model validation using the Day/Night Band (DNB) of the Visible/Infrared Imager Radiometer Suite (VIIRS). The daytime statistical results of the Structural Similarity Index Measure (SSIM), Peak Signal-to-Noise Ratio (PSNR), Root Mean Square Error (RMSE), Correlation Coefficient (CC), and Bias are 0.885, 28.3, 0.0428, 0.984, and -0.0016 respectively, completely surpassing the model performance of previous studies. The nighttime statistical results of SSIM, PSNR, RMSE, and CC are 0.821, 24.4, 0.0643, and 0.969 respectively, which are slightly negatively impacted by the parallax between satellites. We performed full-disk model validation which proves our model could also be readily applied in the tropical ocean without TCs in the northern hemisphere. This model contributes to the nighttime monitoring of meteorological phenomena by providing accurate AI-generated visible imagery with adjustable virtual sunlight directions.

validation, vis image, zenith angle, (16 more...)

arXiv.org Artificial Intelligence

2401.11679

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Japan (0.04)
Europe > Finland (0.04)
(9 more...)

Genre: Research Report (1.00)

Industry:

Education > Educational Setting (0.67)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.51)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Infrared Image Super-Resolution: Systematic Review, and Future Trends

Huang, Yongsong, Miyazaki, Tomo, Liu, Xiaofeng, Omachi, Shinichiro

arXiv.org Artificial IntelligenceNov-15-2023

Image Super-Resolution (SR) is essential for a wide range of computer vision and image processing tasks. Investigating infrared (IR) image (or thermal images) super-resolution is a continuing concern within the development of deep learning. This survey aims to provide a comprehensive perspective of IR image super-resolution, including its applications, hardware imaging system dilemmas, and taxonomy of image processing methodologies. In addition, the datasets and evaluation metrics in IR image super-resolution tasks are also discussed. Furthermore, the deficiencies in current technologies and possible promising directions for the community to explore are highlighted. To cope with the rapid development in this field, we intend to regularly update the relevant excellent work at \url{https://github.com/yongsongH/Infrared_Image_SR_Survey

image super-resolution, information, ir image, (16 more...)

arXiv.org Artificial Intelligence

2212.12322

Country:

North America > United States > Massachusetts (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback